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4 . Statistical interpolation with least-squares < 

In this section we present the fundamental equation for linear analy 
form: the least squares estimation, also called Best Linear Unbiase 
following sections will provide more explanations and illustrations, 
least-squares estimation can be simplified to yield the most commo 
nowadays in meteorology and oceanography. 

4.1 Notation and hypotheses 

The dimension of the model state is « and the dimension of the ob 
will denote: 

x t true model state (dimension » ) 

x b background model state (dimension » ) 

x a analysis model state (dimension « ) 

y vector of observations (dimension p ) 

H observation operator (from dimension « -to p ) 

b covariance matrix of the background errors ( x b~ x t> (dime 
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R covariance matrix of observation errors (y-SUtl) (dimen 
a covariance matrix of the analysis errors ( x a -x t> (dimensio 

The following hypotheses are assumed: 

• Linearized observation operator: the variations of the ob 
vicinity of the background state are linear: for any x close eno 
H(x)-H(x b ) = H(x-x b ) w here h jsa linear operator. 

• Non-trivial errors: b and R are positive definite matrices. 

• Unbiased errors: the expectation of the background and o 

_ ^ __ _ ___ _ q 

• Uncorrelated errors: observation and background errors a 
i.e. (x b -x t )(y-Jf[yJ) T = 0 

• Linear analysis: we look for an analysis defined by correc 
which depend linearly on background observation departures 

• Optimal analysis: we look for an analysis state which is as 
true state in an r.m.s. sense (i.e. it is a minimum variance esti 

ref: Daley 1 991 ; Lorenc 1 986; GhH 1 989 
4.2 Theorem: least-squares analysis equations 

(a) The optimal least-squares estimator, or BLUE analysis, 
following interpolation equations: 

x a = x b + K(y-ff[x b ]) 
K = BH T (HBH T + R)" 1 

where the linear operator k js called the gain, or weight ma 
(a) The analysis error covariance matrix is, for any K : 

A = (I-KH)B(I-KH) T + KRK T 

If K js the optimal least-squares gain, the expression becom 

A = (I-KH)B 
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(a) The BLUE analysis is equivalently obtained as a solutio 
optimization problem: 

x a = Arg mine/ 

J(x) = (x-x b ) T B- L (x-x b ) + (y-i?[xJ) T R- l (y-j 
= J b (x) + J Q (x) 

where J is called the cost function of the analysis (or misfit 
is the background term, J o is the observation term. 

(a) The analysis x * is optimal: it is closest in an r.m.s. sense 

(b) If the background and observation error pdfs are Gauss 
maximum likelihood estimator of x t . 



Proof: 

With a translation of x by x b , we can assume that H = K s 
operator is linear for our purposes. The equation (A1) is simp 
expression of the fact that we want the analysis to depend lin 
departures. The expression of K in (A2) is well-defined becau 
matrix, and hbh t is positive. The minimization problem (A5) 
J o is a convex function and is a strictly convex function (it 

The equivalence between items (a) and (c) of the theorem s 
requirement that the gradient of J is zero at the optimum x * : 

VJfxJ = 0 = 2B- 1 (x a -x b )-2H T R- 1 (y-H[x a ]) 

0 = B- i (x a -x b )-H T R- L (y-ff[x b ])-H T ] 
(x.-Xb) = (B- l + H T R- L H)" L H T R- l (y-if[x b ]) 

The identity with (A2) is straightforward to prove (all inverse 
positive definite): 
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H T R" L (HBH T + R) = (B 1 + H T R* l H)BH T 
= H T +H T R -i HBH T 



hence 

(B" L + H T R" l H)" l H T R _1 = BH T (HBH T + R)~ 

The expressions (A3) and (A4) for A are obtained by rewriti 
(A1) in terms of the background, analysis and observation err 

e 0 = y-H[x t ] 
e 0 -£ b = K(e 0 -He b ) 
e a = (I-KH)e b + Ke 0 

By developing the expression of e a e„ and taking its expectat 

expectation operator one finds the general expression (A3) (r 
being uncorrelated, their cross-covariance is zero). The simpl 
derive by substituting the expression for the optimal K and si 
cancel. 

Finally to prove (A2) itself we take the analysis error covaria 
and we minimize its trace, i.e. the total error variance: (note th 

Ti(A) = Tr(B) + Tr(KHBH T K T )-2TT(BH T K T )+Ti 

This is a continuous differentiate scalar function of the coef 
express its derivative as the first- order terms in k of the d 
Tr(A)(K + L)-Tr(A)(K) , L being an arbitrary test matrix: 
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d K [Ti(A)]L = 2Tr(KHBH T L T ) -2Tr(BH T L T ) +2T: 
= 2Ti (KHBH T L T - B H T L T + KRL T ) 
= 2Tr{[K(HBH T + R)-BH T ]L T } 

The last line shows that the derivative is zero for any choice 
(HBH T + R)K T -HB = 0 , which is equivalent to 

K = BH T (HBH T + R) _l 

because (HBH T +R) is assumed to be invertible. 

In the case of Gaussian pdfs, one can model the background, obse 
as follows, respectively: ( & , ° and a are normalization factors.) 



which yields the right averages and covariances for the backgroun 
and the analysis error pdf is simply defined as the Bayesian produc 
sources of information, the background and the observation pdfs (t 
rigorously by using Bayes' theorem to write <£ \ as a conditional pro 
observations and the a priori pdf of the background). Then, by takin 
i£ J x ) , one finds that the model state with the maximum probability 
that minimizes the cost function expressed in the theorem. 

4.3 Comments 

The hypotheses of non-triviality can always been made in well-pos 
is non-positive, one can restrict the control space to the orthogonal 
analysis will not make any correction to background variables that 
is not a surjection, then some observations are redundant and the 
be restricted to the image of H . if R is non-positive, the expressio 
(then the analysis will be equal to the observed value at the observ 
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the variational version of the least-squares analysis cannot be use 
some algebraic precautions) to have some infinite eigenvalues in J 
which means that some observations are not used because their e 

The hypothesis of unbiased errors is a difficult one in practice beca 
significant biases in the background fields (caused by biases in the 
the observations (or in the observation operators). If the biases are 
subtracted from the background and observation values, and the a 
the debiased quantities. If the biases are left in, the analysis will no 
it will seem to reduce the biases by interpolating between the back 
It is important to monitor the biases in an assimilation system, e.g. 
background departures, but it is not trivial to decide which part of th 
observation biases. The problem of bias monitoring and removal is 
research. 

The hypothesis of uncorrelated errors is usually justified because t 
background and in the observations are supposed to be completel 
one must be careful about observation preprocessing practices (su 
procedures) that use the background field in a way that biases the 
background. It might reduce the apparent background departures, 
analysis to be suboptimal (too close to the background, a condition 
problem). 

The tangent linear hypothesis is not trivial and it is commented in t 

It is possible to rewrite the least-squares analysis equations in term 
error covariance matrices, called information matrices. It makes the 
complicated, but it allows one to see clearly that the information co 
the sum, in a simple sense, of the observations provided by the ba 
observations. This is illustrated in the section on the estimation of a 

It will be shown in the section on dual algorithms (PSAS analysis) t 
particular the cost function J , can be rewritten in the space of the 
easy to that least-squares analysis is closely related to a linear reg 
state and observations. 

4.4 On the tangent linear hypothesis 

The hypothesis of linearized observation operator is needed in ord 
algebraic expression for the optimal K . In practice, H may not be 
makes physical sense to linearize it in the vicinity of the backgroun 

ff(x)- J ff(x b )«H(x-x b ) 
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Then, K being a continuous function of H ( the least-squares equa 
should intuitively yield a nearly optimal x * . 

More generally, the tangent linear hypothesis on H can be written 
Young formula in the vicinity of an arbitrary state x and for a pertur 



H(x + h) = H[x) + Hh + 0(lhi') 



with lim^o 0(\\h{^)h " = o . This hypothesis, called the tangent tin 

acceptable if the higher-order variations of H can be neglected (in 
no discontinuities) for all perturbations of the model state which ha 
magnitude as the background errors. The operator H is called the 

derivative, or tangent linear (TL)- function of H at point x . Althoug 
mathematical property of H , it is not enough for practical purposes 
approximation 

H(x+h)-H(x)^Wi 

must be satisfactory, in user-defined terms, for finite values of A th 
application considered. In the least-squares analysis problem, we n 

y - H(x ) * y - H(x - x b ) + H (x b ) 

for all values of x that will be encountered in the analysis procedur 
and also all trial values used in the minimization of JOO if a variatio 
performe d-. Thus the important requirement is that the difference b 



should be much smaller than the typical observation erro 

model state perturbations x_x b of size and structure consistent wi 
errors, and also with the amplitude of the analysis increments x a _: 

Thus the problem of linearizing H is not just related to the observa 
must be appreciated in terms of the errors in the background x b too 
assimilation system are the previous forecast errors, which depend 
and the quality of the model. Ultimately the correctness of the linea 
appreciated in the context of the fully integrated assimilation system 
the linearization to a good system because the departures x_x b w 
the linearization may be inapplicable to difficult data assimilation pr 
case with ocean models or satellite data, which means that it can b 
sophisticated analysis algorithms that rely too much on the linearity 
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The linearization problem can be even more acute for the lineariza 
operator M which is needed in 4D-Var and in the Kalman filter des 
linearization of K , it may or may not be licit depending on the qua 
the assimilation system: data coverage, observation quality, model 
and forecast range. The user requirements and the physical prope 
be considered. 

The non-linear analysis problem 

The assumption of linear analysis is a strong one. Linear algebra is 
optimal analysis equations. One can rely on the linearization of a w 
observation operator, at the expense of optimality. The incrementa 
for the variational analysis) performs this procedure iteratively in an 
make the analysis more optimal. For strongly non-linear problems, 
simple way to calculate the optimal analysis. The simulated annea 
specific methods, such as the simplex, deal with variables with bou 
Finally, it is sometimes possible to make a problem more linear sim 
of model and observation variables (see the section on minimizatio 

4.5 The point of view of conditional probabilities 

It is interesting to formalize the analysis problem using the conditio 
probabilities. Let us denote P(x) the a priori pdf (probability density 
state before the observations are considered, i.e. the background p 
the pdf of the observations. The aim of the analysis is to find the m 
conditional probability of the model state given the observations. T 
(i.e. the probability that * and y occur together) is 

P(jca;v) = P(x\y)P(y) = P(y\x)P(x) 

i.e. it is the probability that * occurs when y occurs, and vice versa 
the Bayes theorem. In the analysis procedure we know that a mea 
and we know its value y , so P(y) = i and we obtain 

P(x\y) = P(y\x)P(x) 

which means that the analysis pdf is equal to the background pdf t 
P(y\x) . The latter peaks at y = H(x) but it is not a Dirac distributi 
observations are not error-free. 

The virtue of the probabilistic derivation of the analysis problem is t 
non-Gaussian probabilities (although this spoils the equivalence w 
K ). A practical application is done in the framework of variational q 
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assumed that observation errors are not Gaussian but they contain 
errors", i.e. there is a probability that the error is not generated by t 
physical processes but by some more serious problem, like coding 
The gross errors might be modelled using a uniform pdf over a pre 
admissible gross errors, leading to a non-Gaussian observation pd 
the logarithm of this pdf is taken, the resulting observation cost fun 
gives less weight to the observation (i.e. there is less slope) for mo 
strongly with the observed value. 

ref: Lorenc 1986 

4.6 Numerical cost of least-squares analysis 

In current operational meteorological models, the dimension of the 
precisely, of the control variable space) * is of the order of n = 10 
the observation vector (the number of observed scalars) is of the o 

analysis-. Therefore the analysis problem is mathematically underd 
some regions it might be overdetermined if the density of the obser 
resolution of the model). In any practical application it is essential t 
the matrix operators involved in computing the analysis (Fig. 4 ). Th 
method requires in principle the specification of covariance matrice 
inverses in the variational form of the algorithm) which respectively 
n /2 and p 2 /2 distinct coefficients, which are statistics to estimate 
variance or covariance statistic converges like the square root of th 
The explicit determination of K requires the inversion of a matrix o 
asymptotic complexity of the order of p 2 log(n) . The exact minimiz 
«f requires, in principle, n+ l evaluations of the cost function and i 
quadratic and there are no numerical errors (e.g. using a conjugate 
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x a =x b + K(y-Rx b ) 



( !- + M ) 



K=BH^HBH T +R)~ L 



H B H J 



£3 — 



J(x) = (x-^fB^x-Xb) + (y-Hx) T R *(y-Hx) 



Figure 4 . Sketches of the shapes of the matrices and vector d 
an usual analysis problem where there are many fewer observ 
freedom in the model: from top to bottom, in the equations of 
computation of K , of the HBH T term, and the computation of 



It is obvious that, except in analysis problems of very small dimens 
retrievals), it is impossible to compute exactly the least-squares an 
approximations are necessary, they are the subject of the following 

4.7 Conclusion 



We have seen that there are two main ways of defining the statistic 

• either assume that the background and error covariances a 
analysis equations by requiring that the total analysis error va 

• or assume that the background and observation error pdfs 
the analysis equations by looking for the state with the maxim 

Both approaches lead to two mathematically equivalent algorithms 

• the direct determination of the analysis gain matrix K t 
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• the minimization of a quadratic cost function. 

These algorithms have very different numerical properties, and the 
soon as some underlying hypotheses are not verified, like the linea 
operator, for instance. 
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Both qualifiers tangent and linear are needed: obviously H could 
satisfying the Taylor formula. A function can also be tangent to ano 
if the difference between them is an odAfn , e.g. * 2 and x 3 are tan 

x = 0 

2 Qualitatively speaking they all belong to a neighbourhood of ^ h 
which is consistent with the B and R error covariances. 

3 At ECMWF in winter 1998 the control variable dimension was 51 
observations (per 6-hour interval) was about 150000 
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